1,103 research outputs found

    Graph ambiguity

    Get PDF
    In this paper, we propose a rigorous way to define the concept of ambiguity in the domain of graphs. In past studies, the classical definition of ambiguity has been derived starting from fuzzy set and fuzzy information theories. Our aim is to show that also in the domain of the graphs it is possible to derive a formulation able to capture the same semantic and mathematical concept. To strengthen the theoretical results, we discuss the application of the graph ambiguity concept to the graph classification setting, conceiving a new kind of inexact graph matching procedure. The results prove that the graph ambiguity concept is a characterizing and discriminative property of graphs. (C) 2013 Elsevier B.V. All rights reserved

    Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome

    Full text link
    We evaluate a version of the recently-proposed classification system named Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space of sequences of generic objects. The ODSE system has been originally presented as a classification system for patterns represented as labeled graphs. However, since ODSE is founded on the dissimilarity space representation of the input data, the classifier can be easily adapted to any input domain where it is possible to define a meaningful dissimilarity measure. Here we demonstrate the effectiveness of the ODSE classifier for sequences by considering an application dealing with the recognition of the solubility degree of the Escherichia coli proteome. Solubility, or analogously aggregation propensity, is an important property of protein molecules, which is intimately related to the mechanisms underlying the chemico-physical process of folding. Each protein of our dataset is initially associated with a solubility degree and it is represented as a sequence of symbols, denoting the 20 amino acid residues. The herein obtained computational results, which we stress that have been achieved with no context-dependent tuning of the ODSE system, confirm the validity and generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference

    Toward a multilevel representation of protein molecules: comparative approaches to the aggregation/folding propensity problem

    Full text link
    This paper builds upon the fundamental work of Niwa et al. [34], which provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free standardized microenvironment. The hardness of the problem comes from the superposition between the driving forces of intra- and inter-molecule interactions and it is mirrored by the evidences of shift from folding to aggregation phenotypes by single-point mutations [10]. Here we apply several state-of-the-art classification methods coming from the field of structural pattern recognition, with the aim to compare different representations of the same proteins gathered from the Niwa et al. data base; such representations include sequences and labeled (contact) graphs enriched with chemico-physical attributes. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating "easily foldable" from "hardly foldable" molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution used in the various discrimination systems.Comment: 17 pages, 3 figures, 46 reference

    On the long-term correlations and multifractal properties of electric arc furnace time series

    Full text link
    In this paper, we study long-term correlations and multifractal properties elaborated from time series of three-phase current signals coming from an industrial electric arc furnace plant. Implicit sinusoidal trends are suitably detected by considering the scaling of the fluctuation functions. Time series are then filtered via a Fourier-based analysis, removing hence such strong periodicities. In the filtered time series we detected long-term, positive correlations. The presence of positive correlations is in agreement with the typical V--I characteristic (hysteresis) of the electric arc furnace, providing thus a sound physical justification for the memory effects found in the current time series. The multifractal signature is strong enough in the filtered time series to be effectively classified as multifractal

    Modeling and Recognition of Smart Grid Faults by a Combined Approach of Dissimilarity Learning and One-Class Classification

    Full text link
    Detecting faults in electrical power grids is of paramount importance, either from the electricity operator and consumer viewpoints. Modern electric power grids (smart grids) are equipped with smart sensors that allow to gather real-time information regarding the physical status of all the component elements belonging to the whole infrastructure (e.g., cables and related insulation, transformers, breakers and so on). In real-world smart grid systems, usually, additional information that are related to the operational status of the grid itself are collected such as meteorological information. Designing a suitable recognition (discrimination) model of faults in a real-world smart grid system is hence a challenging task. This follows from the heterogeneity of the information that actually determine a typical fault condition. The second point is that, for synthesizing a recognition model, in practice only the conditions of observed faults are usually meaningful. Therefore, a suitable recognition model should be synthesized by making use of the observed fault conditions only. In this paper, we deal with the problem of modeling and recognizing faults in a real-world smart grid system, which supplies the entire city of Rome, Italy. Recognition of faults is addressed by following a combined approach of multiple dissimilarity measures customization and one-class classification techniques. We provide here an in-depth study related to the available data and to the models synthesized by the proposed one-class classifier. We offer also a comprehensive analysis of the fault recognition results by exploiting a fuzzy set based reliability decision rule

    Multifractal Characterization of Protein Contact Networks

    Full text link
    The multifractal detrended fluctuation analysis of time series is able to reveal the presence of long-range correlations and, at the same time, to characterize the self-similarity of the series. The rich information derivable from the characteristic exponents and the multifractal spectrum can be further analyzed to discover important insights about the underlying dynamical process. In this paper, we employ multifractal analysis techniques in the study of protein contact networks. To this end, initially a network is mapped to three different time series, each of which is generated by a stationary unbiased random walk. To capture the peculiarities of the networks at different levels, we accordingly consider three observables at each vertex: the degree, the clustering coefficient, and the closeness centrality. To compare the results with suitable references, we consider also instances of three well-known network models and two typical time series with pure monofractal and multifractal properties. The first result of notable interest is that time series associated to proteins contact networks exhibit long-range correlations (strong persistence), which are consistent with signals in-between the typical monofractal and multifractal behavior. Successively, a suitable embedding of the multifractal spectra allows to focus on ensemble properties, which in turn gives us the possibility to make further observations regarding the considered networks. In particular, we highlight the different role that small and large fluctuations of the considered observables play in the characterization of the network topology

    A generative model for protein contact networks

    Full text link
    In this paper we present a generative model for protein contact networks. The soundness of the proposed model is investigated by focusing primarily on mesoscopic properties elaborated from the spectra of the graph Laplacian. To complement the analysis, we study also classical topological descriptors, such as statistics of the shortest paths and the important feature of modularity. Our experiments show that the proposed model results in a considerable improvement with respect to two suitably chosen generative mechanisms, mimicking with better approximation real protein contact networks in terms of diffusion properties elaborated from the Laplacian spectra. However, as well as the other considered models, it does not reproduce with sufficient accuracy the shortest paths structure. To compensate this drawback, we designed a second step involving a targeted edge reconfiguration process. The ensemble of reconfigured networks denotes improvements that are statistically significant. As a byproduct of our study, we demonstrate that modularity, a well-known property of proteins, does not entirely explain the actual network architecture characterizing protein contact networks. In fact, we conclude that modularity, intended as a quantification of an underlying community structure, should be considered as an emergent property of the structural organization of proteins. Interestingly, such a property is suitably optimized in protein contact networks together with the feature of path efficiency.Comment: 18 pages, 67 reference

    Cancer diagnosis using deep learning: A bibliographic review

    Get PDF
    In this paper, we first describe the basics of the field of cancer diagnosis, which includes steps of cancer diagnosis followed by the typical classification methods used by doctors, providing a historical idea of cancer classification techniques to the readers. These methods include Asymmetry, Border, Color and Diameter (ABCD) method, seven-point detection method, Menzies method, and pattern analysis. They are used regularly by doctors for cancer diagnosis, although they are not considered very efficient for obtaining better performance. Moreover, considering all types of audience, the basic evaluation criteria are also discussed. The criteria include the receiver operating characteristic curve (ROC curve), Area under the ROC curve (AUC), F1 score, accuracy, specificity, sensitivity, precision, dice-coefficient, average accuracy, and Jaccard index. Previously used methods are considered inefficient, asking for better and smarter methods for cancer diagnosis. Artificial intelligence and cancer diagnosis are gaining attention as a way to define better diagnostic tools. In particular, deep neural networks can be successfully used for intelligent image analysis. The basic framework of how this machine learning works on medical imaging is provided in this study, i.e., pre-processing, image segmentation and post-processing. The second part of this manuscript describes the different deep learning techniques, such as convolutional neural networks (CNNs), generative adversarial models (GANs), deep autoencoders (DANs), restricted Boltzmann’s machine (RBM), stacked autoencoders (SAE), convolutional autoencoders (CAE), recurrent neural networks (RNNs), long short-term memory (LTSM), multi-scale convolutional neural network (M-CNN), multi-instance learning convolutional neural network (MIL-CNN). For each technique, we provide Python codes, to allow interested readers to experiment with the cited algorithms on their own diagnostic problems. The third part of this manuscript compiles the successfully applied deep learning models for different types of cancers. Considering the length of the manuscript, we restrict ourselves to the discussion of breast cancer, lung cancer, brain cancer, and skin cancer. The purpose of this bibliographic review is to provide researchers opting to work in implementing deep learning and artificial neural networks for cancer diagnosis a knowledge from scratch of the state-of-the-art achievements

    An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery

    Full text link
    We propose a multi-agent algorithm able to automatically discover relevant regularities in a given dataset, determining at the same time the set of configurations of the adopted parametric dissimilarity measure yielding compact and separated clusters. Each agent operates independently by performing a Markovian random walk on a suitable weighted graph representation of the input dataset. Such a weighted graph representation is induced by the specific parameter configuration of the dissimilarity measure adopted by the agent, which searches and takes decisions autonomously for one cluster at a time. Results show that the algorithm is able to discover parameter configurations that yield a consistent and interpretable collection of clusters. Moreover, we demonstrate that our algorithm shows comparable performances with other similar state-of-the-art algorithms when facing specific clustering problems
    • …
    corecore